Dictionary-Based Fast Transform for Text Compression with High Compression Ratio

نویسندگان

  • Weifeng Sun
  • Amar Mukherjee
چکیده

In this paper we introduce a dictionary-based fast lossless text transform algorithm. This algorithm utilizes ternary search tree to expedite transform encoding operation. Based on an efficient dictionary mapping model, this algorithm use a fast hash function to achieve a lightening speed in the transform decoding phrase. Results shows that the average compression time using the transform algorithm with bzip2 -9 , gzip 9 and PPMD is 28.1% slower, 50.4% slower and 21.2% faster compared to the original bzip2 -9 , gzip -9 and PPMD respectively. Meanwhile, the overhead in the decompression phrase is negligible. Facilitated with our proposed transform algorithm, bzip2 –9 and PPMD both achieve a better compression performance than most of the improvements on the BWT and PPM based compression algorithms. Especially, we get the conclusion that Bzip2 in conjunction with this transform is better than PPMD both in time complexity and compression performance. This transform is especially suitable for domainspecific lossless text compression.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Dictionary-Based Multi-Corpora Text Compression System

In this paper we introduce StarZip, a multi-corpora text compression system, together with its transform engine StarNT. StarNT achieves a superior compression ratio than almost all the other recent efforts based on BWT and PPM. StarNT is a dictionary-based fast lossless text transform. The main idea is to recode each English word with a representation of no more than three symbols. This transfo...

متن کامل

A Novel Multidictionary Based Text Compression

The amount of digital contents grows at a faster speed as a result does the demand for communicate them. On the other hand, the amount of storage and bandwidth increases at a slower rate. Thus powerful and efficient compression methods are required. The repetition of words and phrases cause the reordered text much more compressible than the original text. On the whole system is fast and achieve...

متن کامل

Dictionary-Based Fast Transform for Text Compression

In this paper we present StarNT, a dictionary-based fast lossless text transform algorithm. With a static generic dictionary, StarNT achieves a superior compression ratio than almost all the other recent efforts based on BWT and PPM. This algorithm utilizes ternary search tree to expedite transform encoding. Experimental results show that the average compression time has improved by orders of m...

متن کامل

Enhancing Dictionary Based Preprocessing For Better Text Compression

With the rapid growing of data and number of applications, there is a crucial need of dictionary based reversible transformation techniques to increase the efficiency of the compression algorithms and hence contribute towards the enhancement in compression ratio. Performance analysis of compression methods in combination with the various transformation techniques is obtained for different text ...

متن کامل

Preprocessing of XML file via dictionary method for faster data compression and decompression

compression algorithms reduce the redundancy in data representation to decrease the storage required for that data. Data compression offers an attractive approach to reducing communication costs by using available bandwidth effectively. Dictionary-based code compression techniques are popular as they offer both good compression ratio and fast decompression scheme. XML is a popular self describi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002